In this R Markdown document we will describe the simulation process involved in Elo Ranking for squad-level peer evaluations. Because the data on actual cadets is private, we will simulate a squad and their preferences. Additionally, we will introduce a basic sentiment analysis method used to calculate positive and negative sentiments. Note: this is a static file. If the reader wishes to edit the code for learning purposes, please see our Deepnote Notebook.
To simulate a squad we need to represent people and their preferences in data structures native to R. For the remainder of this document, we refer to the variable \(n\) as the squad size. On the highest level the squad and their preferences are stored in a named-list of length \(n\). Where each object in the named-list is indexed by the name of a squad member. Each observation in the named-list is a matrix of dimension \(n\) x \(n\) to represent the preferences of the indexed squad member. The code below creates a squads named-list where \(n = 6\).
# First let's name all our squad members
team <- c("nico", "mia", "samar", "jeff", "joe", "eva")
#Create a truth matrix for the first squad member (details to follow)
truth1 <- matrix(c(NA,NA,NA,NA,NA,NA,
0,0,1,1,1,1,
0,0,0,1,1,1,
0,0,0,0,1,1,
0,0,0,0,0,1,
0,0,0,0,0,0),
nrow = 6, byrow = TRUE)
#assign dimnames to reference the truth matrix by name
dimnames(truth1) <- list(team, team)
#Repeat for other squad members
truth2 <- matrix(c(0,1,1,1,1,1,
NA,NA,NA,NA,NA,NA,
0,0,0,1,1,1,
0,0,0,0,1,1,
0,0,0,0,0,1,
0,0,0,0,0,0),
nrow = 6, byrow = TRUE)
dimnames(truth2) <- list(team, team)
truth3 <- matrix(c(0,1,1,1,1,1,
0,0,1,1,1,1,
NA,NA,NA,NA,NA,NA,
0,0,0,0,1,1,
0,0,0,0,0,1,
0,0,0,0,0,0),
nrow = 6, byrow = TRUE)
dimnames(truth3) <- list(team, team)
truth4 <- matrix(c(0,1,1,1,1,1,
0,0,1,1,1,1,
0,0,0,1,1,1,
NA,NA,NA,NA,NA,NA,
0,0,0,0,0,1,
0,0,0,0,0,0),
nrow = 6, byrow = TRUE)
dimnames(truth4) <- list(team, team)
truth5 <- matrix(c(0,1,1,1,1,1,
0,0,1,1,1,1,
0,0,0,1,1,1,
0,0,0,0,1,1,
NA,NA,NA,NA,NA,NA,
0,0,0,0,0,0),
nrow = 6, byrow = TRUE)
dimnames(truth5) <- list(team, team)
truth6 <- matrix(c(0,0,0,0,0,0,
0,0,0,0,0,0,
0,0,0,0,0,0,
0,0,0,0,0,0,
0,0,0,0,0,0,
NA,NA,NA,NA,NA,NA),
nrow = 6, byrow = TRUE)
dimnames(truth6) <- list(team, team)
#combine all the truth matrix to a list
truth_list <- list(truth1, truth2, truth3, truth4, truth5, truth6)
#turn the list into the named list!
names(truth_list) <- team
Let’s breifly how we formalize the truth matrix for each squad
member. Each truth matrix represents the respective squad members’
preferences. Comparisons in the truth-table are made in a row-to-column
fashion. Consider truth1 that represents the prefrences of
nico. If nico beilives that mia
is no-doubt better than samar, then \(table_{2,3}\) would be 1. If
mia is no-doubt worse than samar, then \(table_{2,3}\) would be 0. If there is any
uncertainty, a proportion is assigned to the comparison. If \(table_{2,3}\) is .5 then nico
thinks that mia and samar are equal and will
vote completely randomly each time. If .5 < \(table_{2,3}\) < 1, then we assume
nico thinks that mia is better than
samar but not decisively. Finally, If 0 < \(table_{2,3}\) < .5, then we assume that
samar is better than mia but not
decisively.
The structure of the truth matrix is also important. For instance,
for nico the first row of the preferences are
NA because these values are useless due to the fact that
nico is never asked to evaluate himself against other squad
members. Additionally, the diagonals of the matrix are 0 because, due to
the nature of the ranking/scoring process, squad members are never asked
to pick between the person. For example, nico is never
asked to pick between mia and mia.
With our fake squad representation, we can get into the nitty gritty of the elo scoring method.
The function elo takes the named-list discussed above
and the parameter \(k\). The parameter
\(k\) represents the K-factor discussed
in the full write-up. In the base Elo function, we generate the unique
\(\frac{n(n-1)}{2}\) match-ups, shuffle
the match-ups in a random order, filter down the matches such that squad
members will not make picks in match-ups that include themselves,
simulate picks from the truth-table, and update Elo scores.
elo <- function(truth_table_list, k) {
#Extract team names, make and shuffle match-ups
team <- names(truth_table_list)
matchups <- t(combn(team, 2))
matchups <- matchups[sample(nrow(matchups)),]
#init score as named vector
score <- rep(1000, length(team))
names(score) <- team
#loop through each team member
for (t in team) {
#filter matches and get truth table
fil_matches <- matchups[matchups[,1] != t & matchups[,2] != t,]
truth_table <- truth_table_list[[t]]
#flip truth table to prevent 0 errors
truth_table[lower.tri(truth_table)] <- t(truth_table)[lower.tri(truth_table)]
#loop through each match index
for (mi in 1:nrow(fil_matches)) {
#get the match-up
match <- fil_matches[mi,]
#pick winner or looser based on conditional
if (truth_table[match[1], match[2]] == 1) {
winner <- match[1]
loser <- match[2]
} else if (truth_table[match[1], match[2]] == 0) {
winner <- match[2]
loser <- match[1]
} else {
prob <- truth_table[match[1], match[2]]
win_index <- sample(1:2, 1, prob = c(prob, (1-prob)))
winner <- match[win_index]
loser <- match[ifelse(win_index == 1, 2, 1)]
}
#Calc expected score
Ew <- 1 / (1 + 10^((score[loser] - score[winner]) / 400))
El <- 1 / (1 + 10^((score[winner] - score[loser]) / 400))
#Update the weighting for each player
Rw <- score[winner] + k*(1 - Ew)
Rl <- score[loser] + k*(0 - El)
score[winner] <- Rw
score[loser] <- Rl
}
}
# Return named vector of one iteration
return(score)
}
Because the elo function represents one iteration, and
we need to account for time (a.k.a the order of the rankings) we need to
run many iterations to allow the elo scores to converge to
some number. We could preset the number of iterations and run it \(x\) times. However, different sized squads
take more iterations to converge, and imagine having to shuffle and
recalculate elo scores 1000 times for hundreds of squads for every
summer training event: it is very computationally costly.
In order to address this issue, developers at West Point terminate the iterations based on some convergence criteria. Although we are not certain of their methods, we simulate said convergence based on a confidence band made up in length and width such that the probability that the running average of elo scores falls outside of this band is practically null. The ‘confidence band’ method is outlined in a paper by Mustafa Ata, and produces more efficient computations. We set the parameters for epsilon and zeta to the half-width and length of our confidence band. If we set epsilon to .5 and zeta to 50 then we telling the elo function to repeat iterations until the running average elo score for each squad member changes by less than 1 for 50 iterations. We implement the convergence method below.
elo_converge <- function(truth_table_list, k, epsilon, zeta) {
#run one iteration so we have a starting point
init <- elo(truth_table_list, k)
#init matricies to hold raw scores and avgs
raw_scores <- matrix(init, ncol = length(init), byrow = TRUE)
colnames(raw_scores) <- names(truth_table_list)
running_avg <- matrix(init, ncol = length(init), byrow = TRUE)
colnames(running_avg) <- names(truth_table_list)
#calculate the upper limit and lower limit of confidence band
u_lim <- init + epsilon
l_lim <- init - epsilon
#initialize z to keep track convergence iterations
z <- rep(0, length(names(truth_table_list)))
#init iterations
j <- 1
#as long as we have not converged
while (any(z != zeta)) {
#run elo one more time and update variables
j <- j + 1
res <- elo(truth_table_list, k)
raw_scores <- rbind(raw_scores, res)
running_avg <- rbind(running_avg, ((colSums(running_avg, na.rm = TRUE) + res) / j))
#if we are within the confidence band add to z
if (all(running_avg[j,] >= l_lim & running_avg[j,] <= u_lim)) {
z <- z + 1
#if we are outside the confidence band reset z, and update limits
} else if (any(running_avg[j,] < l_lim | running_avg[j,] > u_lim)) {
z <- rep(0, length(names(truth_table_list)))
u_lim <- running_avg[j,] + epsilon
l_lim <- running_avg[j,] - epsilon
}
}
#store matricies in a list and return said list
lst <- list(raw_scores, running_avg, j)
names(lst) <- c("raw_scores", "running_avg", "iterations")
return(lst)
}
Using the proposed function, we can rank our hypothetical squad from the first chunk and form some interesting visualizations to characterize the elo scoring simulation process.
First let’s run the simulation for the hypothetical squad and plot the moving average across each iteration to see how the moving average score converges for each squad memeber.
library(ggplot2)
library(tidyr)
library(dplyr)
library(plotly)
#run simulation
sim_results = elo_converge(truth_list, 30, .5, 50)
iterations = sim_results[['iterations']]
#make a dataframe from the results
df1 <- data.frame(avg = sim_results['running_avg'])
colnames(df1) <- team
#elongate df for plotting purposes
df_long = df1 %>% gather() %>%
mutate(iteration = rep(1:iterations, length(team)))
#make plot
p1 <- ggplot(df_long) +
geom_line(aes(x = iteration, y = value, color = key )) +
theme_minimal() + xlab("Iterations") + ylab("Team Member Score") +
ggtitle("Score Convergence")
ggplotly(p1)
As we can see from the plot above, it appears that the scores converge after the first iteration, because there is no crossing between team members scores. However, the plot above is interactive and if we zoom in on each individual line we can tell there is some fluxuation in the running average before convergence. Each running average looks well seperated, because we designed the truth matrices to make a clear ranking for each member. We can see below that we can determine the final rankings using the last iterations running average score. We encourage you to play around with different truth matricies to see how these plots change.
df <- data.frame(avg = sim_results[["running_avg"]][nrow(sim_results[["running_avg"]]),],
name = colnames(sim_results[["running_avg"]]))
p2 <- ggplot(df) +
geom_bar(aes(x = reorder(name, avg), y = avg), stat = 'identity', alpha = .75) +
theme_minimal() + coord_flip() +
ggtitle("Hypothetical Squad Rankings") +
xlab("Elo Score") + ylab("Squad Member") +
theme(legend.position = 'none')
ggplotly(p2)
As discussed in the longer write-up there are many applications and implementations of sentiment analysis. In this document, we will review sentiment analysis using the sentimentr package in R. As apposed to supervised sentiment analysis tasks, sentimentr uses an augmented dictionary look up to classify each piece of text. For instance, each word is associated with some sentiment, and sentimentr combines the sentiment for each word in a sentence in a smart way to calculate sentiment. Sentimentr accounts for valence shifters in their dictionary look up. A valence shifter changes the sentiment for any given word, and there are four types: a negator, an amplifier, a de-amplifier, and an adversative conjunction. Consider the sentence “I like pizza,” with a negator the sentence becomes “I do not like pizza” which makes the sentence negative. An amplifier makes the sentence “I really like pizza” which increases the positive sentiment. A de-amplifer makes the sentence “I hardly like pizza” which decreases the postive sentiment associated with the sentence. Finally, a adversative conjuction makes the sentence “I like pizza but it’s not worth 1 million dollars.” which overules the clause “I like pizza” and makes the sentence more negative.
Now, we will provide a quick showcase of sentimentr which outputs sentiment on a [0,1] scale for each sentence.
#Load Package
library(sentimentr)
#Provide some example peer review sentences!
good = "I enjoyed working with Jeff he was the strongest soldier in my squad"
bad = "Nico consitently goofed off during training which made him a bad teammate"
great = "I really love Joe, he did such a fantastic job, he is very big and strong!"
middle = "It was against regulation for samar to bring whiskey but it was fun"
reviews = c(good, bad, great, middle)
#The package provides a simple workflow
sentences = get_sentences(reviews)
sent_df = sentiment(sentences)
#add text to sent df
sent_df$reviews = reviews
#add class
sent_df = sent_df %>%
mutate(classification = ifelse(sentiment > 0, "positive", "negative"))
p3 = ggplot(sent_df) +
geom_bar(aes(x = reorder(reviews, sentiment), y = sentiment, fill = classification), stat = 'identity', alpha = .75) +
theme_minimal() + coord_flip() +
ggtitle("Hypothetical Sentiment Analysis") +
xlab("Review") + ylab("Sentiment") +
theme(legend.position = 'none')
ggplotly(p3)
We encourage you to try your own example sentences. Can you break sentimentr’s sentiment analysis method?